Main Figure

## Warning in evalq(as.numeric(format(round(NA_real_, 3), nsmall = 3)),
## <environment>): NAs introduced by coercion

This is a draft of the main figure. Outgroups are not included in the analysis below.

Genome wide recombiantion rate estimates

Female Specific Analysis

## 
##   WSB     G   LEW   PWD   MSM  MOLF SKIVE   KAZ  CAST 
##    14    12     9    15    14     1     1     9     1

G and LEW have a higher strain means compared to WSB. MSM (and PWD) also has an elevated mean.

The variance across mouse means for each strain is also variable across strains.

In the dom subspecies the ranges of mouse means are 22.167 to 27.429 in WSB, 23.769 to 30.364 in G, 23.533 to 28.897 in LEW.

In the musc subspecies the ranges of mouse means are 22 to 30.091 in PWD, 23.84 to 27.45 in KAZ.

The range is 24.9 to 31.733 in MSM.

Two glm models were run to test the effects of strain and subspecies.

##             Estimate Pr(>|t|)
## (Intercept) 24.71164  0.00000
## subspCast    1.28836  0.48177
## subspMusc    0.85569  0.25905
## subspMol     2.90736  0.11511
## strainG      3.29752  0.00001
## strainLEW    1.69647  0.02729
## strainPWD    0.30253  0.68472
## strainMSM    0.06986  0.96952
## strainSKIVE  0.37067  0.84220
##             Estimate Pr(>|t|)
## (Intercept) 24.71164  0.00000
## strainG      3.29752  0.00001
## strainLEW    1.69647  0.02729
## strainPWD    1.15822  0.08104
## strainMSM    2.97721  0.00003
## strainMOLF   2.90736  0.11511
## strainSKIVE  1.22636  0.50303
## strainKAZ    0.85569  0.25905
## strainCAST   1.28836  0.48177

Above are the coefficents for the two glm’s of the female specific mouse MLH1 averages, which include subspecies and strain as fixed effects. G has the most consistant significant strain effects in both models. MSM has a pretty low p value in the second model, LEW has a slightly significant pvalue.

G is 1.1003204 higher the the other means. LEW is 1.0374241 higher and MSM is 1.0877373. These three will be designated as ‘moderate high rec’ strains for later sex specific analysis.

Male Specific Analysis

The male specific analysis was done to assess the variance across strains. The plots below illustrate mouse level means for MLH1 per cell serperated by subspecies.

## Warning: Removed 1 rows containing missing values (geom_point).

  • There is a low degree of strain varaince in Dom with the range of mouse means 22.167 to 30.364

  • Musc and Mol have a much larger amount of variance across means with the range in mouse means being 21.867 to 31.633 in Musc and 23.182 to 33.038 and Molossinus.

  • While there is alot of variance within strains, a general pattern is that PWD, SKIVE and MSM can be classified as ‘high rec’ strains, there strain averages are

Two models were fitted to the male specific data to test the effects of strain and subspecies.

##             Estimate Pr(>|t|)
## (Intercept) 24.45317  0.00000
## strainG     -0.25628  0.64213
## strainLEW    0.59053  0.35227
## strainPERC  -1.64517  0.28680
## strainPWD    4.85321  0.00000
## strainMSM    7.00108  0.00000
## strainMOLF  -0.21317  0.77317
## strainSKIVE  2.62233  0.00063
## strainKAZ   -0.21986  0.71046
## strainTOM    0.24683  0.82703
## strainAST    0.84583  0.37670
## strainCZECH -1.25950  0.18934
## strainCAST  -1.44417  0.20325
## strainHMI   -0.09892  0.90777

The strain average for PWD, MSM and SKIVE are r mean(PWD_male$mean_co) /mean(sans_LewRec_male$mean_co), 1.2923918, and 1.1124778 higher than the other strain means respectively. Due to this and the significant strain effects they will be designedated into the high rec group for later analyses.

Analysis for Evolutionary Patterns

In order to understand the variance within an evolutionary framework we fit the mouse gwRRs to a mixed model, which specified subspecies, sex and the interaction as fixed effects. The subspecies effect - is a proxy for measuring the degree of divergence across subspecies. The strains were coded as a random effect to approximate standing genetic variation across subspecies ranges.

These are the basic Mixed Models we used as a frame work to analyze all the MLH1 count data.

\[mouse \ av.\ metric ~=~ subsp * sex + rand(strain) + \varepsilon \]

The mixed model lets us separate effects to see how they effect a mouse mean CO count. The subspecies effect is a proxy for Divergence and the random strain effect is a proxy for Polymorphism.

This is the break down of mice used within the Mixed model

WSB G LEW PWD MSM MOLF SKIVE KAZ
female 14 12 9 15 14 1 1 9
male 10 11 7 8 4 6 5 11

add number of cells

WSB G LEW PWD MSM MOLF SKIVE KAZ
female 1 1 1 1 1 1 1 1
male 1 1 1 1 1 1 1 1

\[mouse \ av \ meteric ~=~ subsp * sex + rand(1|strain) + \varepsilon\]

Allows random effect from strain. Considers the wild derived inbred strains as random samples from each subspecies territory.

## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)

The significance of the interaction effect for the 2 fixed effects is 2.044853810^{-4}. The random effect for strain effect is 0.

glm followup

To follow up on the mixed model results, we also ran glm / linear models to get a better idea of the strain by sex interaction effects.

\[mouse \ av \ meteric ~=~ subsp * sex * strain + \varepsilon\]

Post hoc model for investigating strain as a fixed effect.

##                     Estimate Pr(>|t|)
## (Intercept)         24.71164  0.00000
## subspMol             2.90736  0.09633
## subspMusc            0.85569  0.23441
## sexmale             -0.15644  0.82201
## strainG              3.29752  0.00000
## strainLEW            1.69647  0.01941
## strainPWD            0.30253  0.66932
## strainMSM            0.06986  0.96795
## strainSKIVE          0.37067  0.83416
## subspMol:sexmale    -3.22256  0.09905
## subspMusc:sexmale   -0.80789  0.43176
## sexmale:strainG     -3.17191  0.00165
## sexmale:strainLEW   -1.19095  0.27809
## sexmale:strainPWD    4.40084  0.00005
## sexmale:strainMSM    6.94639  0.00092
## sexmale:strainSKIVE  2.26233  0.25652

Above is the table of coefficients for MM.3, HQ data set, lm function, sex, strain and subsp are interacting fixed effects.

Strain G is significant (p=0.0005), (remember female result)

interaction terms of sex and strain for (G, MSM, PWD).

How are these models different from the last one – MM4 is missing subsp, so only sex * strain effects.

\[mouse \ av \ metric ~=~ sex * strain + \varepsilon\]

##                     Estimate Pr(>|t|)
## (Intercept)         24.71164  0.00000
## sexmale             -0.15644  0.82201
## strainG              3.29752  0.00000
## strainLEW            1.69647  0.01941
## strainPWD            1.15822  0.06536
## strainMSM            2.97721  0.00001
## strainMOLF           2.90736  0.09633
## strainSKIVE          1.22636  0.48097
## strainKAZ            0.85569  0.23441
## sexmale:strainG     -3.17191  0.00165
## sexmale:strainLEW   -1.19095  0.27809
## sexmale:strainPWD    3.59295  0.00054
## sexmale:strainMSM    3.72384  0.00196
## sexmale:strainMOLF  -3.22256  0.09905
## sexmale:strainSKIVE  1.45444  0.46010
## sexmale:strainKAZ   -0.80789  0.43176

Summary of Mixed Model Analyses

mouse average MLH1 count

Multiple models were tested to test for potential evo models – for the mouse average CO counts. (there were a mix of results – and nuanced)

First full Mixed model – no significant fixed factors –(the random effect might be significant).

The second model were strain was (nested and random), the fixed effects, sex and interaction of (subsp and sex), are much more significant. The coefficients indicate, males in general have 1 less in average, and the musc and molf subsp have ~3 more on average.

Variance effect

** Do these patterns hold across Quality bins? **

Within Mouse Variance in CO Count per Cell

Histograms might not be the best way to represent, but they show the general pattern than female mice have higher within mouse variation for CO counts.

A general pattern is that the female data has higher within mouse variance (both var and cV). Some strains more than others (investigate potential outliers). (LEW females with top 3 variance are all from different dissection dates) This pattern also holds for within mouse cv for MLH1 count (not shown).

Check that the one PWD female with like 0 variance… wtf, that mouse 8oct14_PWD_f2 has 3 cells. make sure it’s removed from this table.

from the Mixed models, there should be 4, 1-2 models from the full dataset for var and cv, and 3-4 from a higher quality data set for var and cv. Only a subset of the model outputs will be shown.

For the mixed models the indiviual fixed effects are tested by a anova of the full and reduced models. The interaction fixed effect pvalue comes from drop1() a LTR test.

\[mouse metric ~=~ subsp * sex + rand(strain)+ \varepsilon\]

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## refitting model(s) with ML (instead of REML)

General pattern is that sex has a very significant effect. ( on the order of 0 ) this applies to different measures of within animal (across cell) variance and also to subsetted data with higher quality cells.

For LEW - females must have high variance, for CV Lew strain alone has increased cv. Consider plotting the variance - and diving deeper into this pattern Most of the strain specific differences in within mouse variance go away in the Q12 data set. (These strain effects could indicate that the distribution of across cell variance in CO number evolves.)

The second set of models analyzed have this form.

\[mouse \ CO \ metric ~=~ subsp * sex * strain + \varepsilon\]

The glm’s support the mixed models with the sex effect being the consistant significant effect

Summary for witin mouse Variance

The general pattern across all the models, is that sex is a significant effect for all models. General pattern is that sex has a very significant effect. The sex-subsp interaction effect p values were larger and only significant for one of the models.

The models run over higher quality cells, suggest that staining or technical error in female data is unlikely to be the only cause for increased variance of number of MLH1 foci per oocyte.

(some strains have significant effect – some LEW females had large leverage).

DMC1 Results, Spermatocytes

These data are meant to test if CO precursors, double strand breaks (DSBs), are significant predictors for MLH1 count variation. Simply, do mice with more DSBs also have more COs?

Foci of DMC1 were scored from Leptotene and Zygotene spermatocytes of juvenile mice (12-14-18 days). One mouse represents each strain. The main comparison to examine is that between the high (PWD and MSM) and low recombining (WSB, G, KAZ) strains.

the pvalues for the differences between time points are 1.032090710^{-5}, 1.097882910^{-4} , and 0.017492 for all observations, the high rec group and the low group respectively.

## Warning: Ignoring unknown aesthetics: xmin, xmax, annotations, y_position

The category difference is only observed in the earlier Leptotene stage (p values report a t.test)

The correlation with MLH1 and leptotene cells is 0.8736143.

The correlation with MLH1 and zygotene cells is 0.284302.

Since there is evidence for non-equal variance across strains for zygotene cells, so don’t rely on lm()s that estimate the effect of strains.

The p-values from the t.tests of the high vs low groups, indicate that the high recombining are significantly higher for the L cells (p value = 0.0021446) while the Z cells, there is not a significant difference across the high and low groups (value = 0.6627983).

Chromosome Class Proportions

In order to decompose the cell wide rate, we decided to look at the proportion or chromosomes with different numbers of COs. The two plots show the chromosome class proportions from hand measured and the curated BivData.

## Warning: Removed 39 rows containing missing values (geom_col).

These results are meant to compare the proportions of bivalents with 0,1,2 or 3 chromosomes. Most of the variation in gwRR across strains in is due to more 2COs at the ‘expense’ / trade off of 1COs.

Most all the the p values for the proportion tests are significant, indicating there are slight but significant shifts across the classes of chromosomes. However the most striking male pattern is the propotion of 2COs

(A previously reported for house mouse, the most prevelent class of chromosomes is the 1CO class. The high rec group of males are the exception indicating, which fits with the conclusion that higher cell wide CO counts are due to more chromosomes(bivalents) have 2 instead of 1 CO.

  • High female strains, G and Lew have significantly more 2CO bivalents.

  • The overall male pattern that’s most striking is the gradient of 2CO proportions

MSM 60% > PWD 50% > SKIVE 30% > 20 - 10% remaining (low) strains

Single Bivalent Level Results

Bivalent level traits and metrics have been added in the src/Setup_BivData.Rmd script. These observations are from the automated image analysis algorithm and have been curated (threw out incorrect algorithm output). The MLH1 data file is also loaded into this file.

The breakdown of single bivalent observations by category

And these are the number of bivalent observations with filled hand.foci.count values.

Validity of comparing bivalent observations While the automated software doesn’t isolate all bivalents/chromosomes from each cell (on average 17), we assume that the isolation process is not biased. Because there are hundreds of observations per category, we assume that each of the 19 autosomes (chromosomes) is equally represented in the dataset of single bivalents.

Two Main questions for these analyses.

Q1. driving questions, which traits are sexually dimorphic?

Q2. which traits fit male polymorphism predictions?

These questions use different datasets: Q1 uses a dataset with only sex matched strains and Q2 uses the data set with all male observations with including strains including those not in the Q1 set.

The mouse averages for 3 bivalent level metrics will be analyzed across these questions:

  • SC lengths (or total.SC)

  • Normalized Rec landscape of 1CO

  • IFDs of 2CO

The same basic models from the MLH1 counts per cell will be used. In addition to basic t.tests and logistic regression models for Q2 to distinguish betwee high and low recombining strains.

Q1. Model 1, mixed model lmer()
\[mouse \ av.\ metric ~=~ subsp * sex + rand(strain) + \varepsilon \]

Q1. Model 2, glm() \[mouse \ av.\ metric ~=~ sex * strain + \varepsilon \]

Q2 Model 3, logistic regression

\[Rec \ group ~=~ mouse \ av.\ metric \]

In the chunk above the mouse averages table is made – may need to add all the extra metrics (IFD, .

Q1 Analysis, Predictions for Heterochiasmy Q1

Using the Mixed model framework which tests the effects (and interactions) of subspecies, sex and strain, I will test for evolution of the following traits.

Two bivalent level traits are predicted to display heterochiasmy (ie significant effects of sex);

  1. SC length will be sexually dimorphic (sex effect will be significant)(cite Lynn)

2.A) Normalized 1CO positions will be sexually dimorphic (sedell and Kirkpatrick).

  1. sister cohesion tension (sis-co-ten) will be sexually dimorphic as it reflects the general property of uniform vs telomere/biased CO positioning.

  2. Centromere and telomere distances will be sexually dimorphic.

  1. Interference / IFD will not be sexually dimorphic. Previous physical measures of interference were not different between sexes (verify petkov 2001).

Q1 SC Lengths

We expect female SC lengths to be longer (refs). In the plot above the SC length ~ higher hand foci cells. Any of the chromosome classes above 3, that don’t have a higher mean are likely due to low data number. For the 0 class chromosomes most all are around the same size of the 1CO distribution. Add in the code for the plot under 2COs males and females

M1. subsp * sex + (strain)

M2. subsp * sex * strain (slightly redundant model)

M3. sex * strain

The dependent variables I’ll be testing in the model framework are:

  • Pooled SC lengths

  • 1CO

  • 2CO

  • 3CO

  • Xlong.biv *Don’t do long bivData for testing sex effect, since this would have the XX

  • Short.biv dataset *this would be a chrm size effect, without the XX

\[mouse \ average \ SC \ length ~=~ subsp * sex + rand(strain) + \varepsilon \]

Q1.SC_HetC_M1

Insert the results for the MMs using the dependent variable 1CO - use the Mixed model (lmer), for all of the metrics (pooled SC, 1CO 2CO ect)

Below’s the code for Mixed model results – try to organize them into a table or something

## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 6.0335, p-value = 0.0042
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 4.2988, p-value = 0.0133
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 0.074804, p-value = 0.3099
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 0.10951, p-value = 0.2911

The fixed sex effect is highly significant for the mouse means of SC length. The data sets tested were all pooled SC lengths, the long bivalents, 1CO and 2CO seperately.

Q1. SC_HetC_M2

All fixed and all interactions

## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing missing values (geom_point).

I think the strange thing about the Musc is that Skive females have shorter SC lengths.

Q1. SC Length t.test

Strain Comparisons of SC Length

Dive deeper into the sex specific pattern for each strain. (I think for all the within strain comparisons across sexes the general pattern is met of females being longer)

(adding all of these blocks to – the setup script); strain.Bivalents.DF

XX adjustment

Generate inteligible summaries for the permutation / subsampling results.

Q1. IFD

Brief lit review for IFD / interference expectations
Petkov et al 2007, CO interference underlies sex differences in RR

“Here we show that in mice, this is because of a shorter genomic interference distance in females than in males, measured in Mb. However, the interference distance is the same in terms of bivalent length. We propose a model in which the interference distance in the two sexes reflects the compaction of chromosomes at the pachytene stage of meiosis.”

Chrm1 genetic map,

(other human and mice refs

Tease n Hulten 2004: – no difference between MLH1 foci in males and females

DeBoer 2006, 13 – measured chrm 1 in males and females, both were 2.8 microns

))

Still working on the best way to display the general IFD patterns.

Mixed Model Tests, Fixed Effects

Mixed model analysis for IFD (interference), the first set of models are made with the lme() functions.

\[mouse \ average \ IFD ~=~ subsp * sex + rand(strain) + \varepsilon \]

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## boundary (singular) fit: see ?isSingular
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 0, p-value = 1
## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 0.51642, p-value = 0.183

The table above should display the slightly unusual pattern, where the coefficients for the significant sex fixed effect are positive and negative in the raw and normalized values respectively. That is for the raw IFD values, females are significantly longer but the normalized IFD values, males are significantly longer.

I tested 2 versions of the mixed model for this flavor of trait, raw IFD and normalized IFD measure. The tables below are from anova( for the lmer model ). Random effect of strain is not significant for ABS IFD, and only slightly significant for the IFD.PER

\[mouse \ average \ IFD ~=~ subsp * sex * strain + \varepsilon \]

\[mouse \ average \ IFD ~=~ sex * strain + \varepsilon \]

##                     Estimate Pr(>|t|)
## (Intercept)         53.66276  0.00000
## sexmale             -5.71830  0.25778
## strainG              6.58451  0.17653
## strainLEW            2.62962  0.55994
## strainPWD            2.11197  0.63425
## strainMSM            6.73069  0.20687
## strainMOLF           0.10039  0.98408
## strainSKIVE         -1.29051  0.80772
## strainKAZ            8.78234  0.08413
## sexmale:strainG     -2.76753  0.66545
## sexmale:strainLEW    1.27919  0.84497
## sexmale:strainPWD    9.94899  0.12866
## sexmale:strainMSM   -0.22595  0.97533
## sexmale:strainSKIVE 13.17570  0.05885
## sexmale:strainKAZ   -8.70052  0.24423
##                     Estimate Pr(>|t|)
## (Intercept)          0.46186  0.00000
## sexmale              0.08073  0.00798
## strainG              0.02579  0.36744
## strainLEW            0.00616  0.81653
## strainPWD            0.01419  0.58778
## strainMSM            0.01279  0.68248
## strainMOLF          -0.01303  0.66054
## strainSKIVE          0.01988  0.52538
## strainKAZ            0.01957  0.50994
## sexmale:strainG     -0.02834  0.45325
## sexmale:strainLEW    0.00586  0.87914
## sexmale:strainPWD    0.04838  0.20894
## sexmale:strainMSM    0.01923  0.65563
## sexmale:strainSKIVE  0.07879  0.05542
## sexmale:strainKAZ   -0.04207  0.33882

For the Mixed models of IFDs, sex is a significant effect for both raw and nrmIFD. for the nrm.IFD, subspecies.

the interaction effects were slightly significant for both raw and nrm.IFD.

The most significant value was the sex effect for nrm.IFDs.

The the random strain effect was not significant for either model.

For the raw measures, in M2, I think the 2 SKIVE effects mean that the female raw IFD is shorter than the male IFD.raw. The other effects are for PWD ansD SKIVE (larger raw IFD from intercept.) I think the MSM and MOLF interaction effects were too far down the list to sop up any variance. For the M3, only SKIVE*male effect is close to significant

For the normalized values in both M2 and M3, sex is a significant effect, increasing nrm.IFD in males. SKIVE*male is the only other consistantly significant effect, which also increases the nrm.IFD measure.

Overall There’s a low amount of significant effects across the 2CO IFD measures. This might be an indication that interference is conserved across these samples and/or that there is too much noise across from chromosome specific effects.

Strain Comparisons

Dive deeper into the sex specific pattern for each strain. Below are code chunks which show the unusual sex specific results for IFD measures. The general pattern is that, female raw IFD > male IFD and female PER IFD < male PER IFD. The scatter plots show that female raw measures are longer than male and for the PER values, the female mean is brought down by an enrichment of short IFDs.

For some strains, PWD, MSM and SKIVE there’s a 30% threshold in the male PER IFD distributions. (What does that mean?). How do I test / quantify this pattern? Cluster metric?

Above is a table of the proportion of 2Co bivalents which have a norm IFD below 30%, For all strains but KAZ, the females have a greater proportion of these shorted IFD values.

The range of normalized IFDs overlap closer in males and females in the WSB data.

The Lew pattern doesn’t have a clean cut off of nrm.IFD. the range of male and females overlap, but there are more female observations below.

For PWD, there are a few observations of the short IFDs for males, but there seems to be a cut-off / threshold at .3

For the KAZ, pattern the distinction between the male and female pattern is less distinct. There are fewer instances of females with v close IFD distances.

In the Skive data, it could be the case that the v. short IFD measures in females are rare / another class of observations.

The MSM pattern has a short range and longer range of nrm.IFD in males and females respectively.

##         
##            0   1   2   3
##   female   0   0   0   0
##   male     9 326  97   5

The strains which show a clean “30% threshold” for normalized IFD in males are: PWD, SKIVE, and MSM (which are the 2 high Rec and a intermediate strain). The other strains which have more overlap between males and females are the Dom strains and KAZ.

IFD 3CO bivalents

Run comparisons for 3CO bivalents.

Q1. General CO Positions

Try accounting for chrm size effects. Also i didn’t notice that I’m using density plots and scatter+boxplots across sections

density plots

\[mouse \ average \ F1 position ~=~ subsp * sex + rand(strain) + \varepsilon \]

## refitting model(s) with ML (instead of REML)
## refitting model(s) with ML (instead of REML)
## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 13.42, p-value = 1e-04

\[mouse \ average \ F1 position ~=~ subsp * sex * strain + \varepsilon \]

##                     Estimate Pr(>|t|)
## (Intercept)          0.58939  0.00000
## subspMusc           -0.04232  0.07674
## subspMol            -0.11164  0.00142
## strainG             -0.03730  0.10652
## strainLEW           -0.04871  0.02662
## strainPWD            0.02478  0.15343
## strainMSM            0.09707  0.00006
## strainSKIVE          0.03011  0.16845
## sexmale              0.14216  0.00000
## subspMusc:sexmale   -0.00992  0.76666
## subspMol:sexmale    -0.02843  0.38886
## strainG:sexmale     -0.02468  0.39850
## strainLEW:sexmale    0.00959  0.74727
## strainPWD:sexmale   -0.03562  0.22582
## strainSKIVE:sexmale -0.02983  0.34596

\[mouse \ average \ F1 position ~=~ sex * strain + \varepsilon \]

The model results aren’t as clear as I’d like. For the Mol, both the sex effect and MOLF strain effect are significant (this means both female and male rec landscapes are effected - most of the time towards the middle

GENERALLY - Sex is the biggest effect for the 1CO landscape (female middle male telomereic) molf Not sure I understand how MSM is supper significant for There must be some strange effect due to the Mol subsp - MOLF isn’t even in M2 ( Male is consistently significant across the two models.

Male and MOLF are the most significant effects for M3

Above plot focuses on the 1CO bivalent normalized positions since CO interference controls the general position of COs when there are multiple COs. This plot shows the sexual dimorphism in the density plots.

Consider adding annotate_text for the number of observations in each category. think about adding a vertical line for centromere, for the position means. Think about removing the extra Musc strains.

These box plot show that females have a much more medial position of single foci bivalents, (much closer to 50% compared to males). They also show that Musc males’ Foci1 position is slightly more central / medial compared to the same type of positions in the Dom male strains. MOLF males have much more medial positions than other strains.

the distribution of SC lengths and sis-coten seems very different across sexes

The mixed model data should only come from 1CO bivalent data.

the mouse average foci1 pos is more significant in t.test, but not log regression… (is something wrong?) Check the mouse averages for the F1_pos, there might be an outlier or mouse with v.few observations.

Siscoten

The metric Sis-co-ten measures the amount of sister cohesion connected to the other pole.

The logic of how the sis-co-ten metric is outlined in the figure below. The goal is to use this metric to model different tension active cohesion amounts as a consequence of different numbers and placements of chiasmata/CO. This metric is calculated using SC area as a proxy to the amount of cohesion at metaphase.

from (Lee, J. (2019). Is age-related increase of chromosome segregation errors in mammalian oocytes caused by cohesin deterioration?. Reproductive Medicine and Biology.)

from (Lee, J. (2019). Is age-related increase of chromosome segregation errors in mammalian oocytes caused by cohesin deterioration?. Reproductive Medicine and Biology.)

## Warning: Removed 133 rows containing missing values (geom_point).

## Warning: Removed 46 rows containing missing values (geom_point).

Males have much clearer separation of siscoten across chrm classes. This is emphasized when SC length is also plotted. It seems like musc males have higher amounts of this metric compared to Dom males.

To formally test the differences in sis-co-ten I plan to write a sub sampling / permutation loop to compare the mean(sis.co.ten) of the same numbers of bivalents of the same class.

BUT females have a greater range – so maybe it’s just a scale issue.

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 21 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 16 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 16 rows containing missing values (geom_point).

## Warning: Removed 8 rows containing missing values (geom_point).

I think the the normalized sis.co.ten plots also show that the there is more clustering of the sis.co.ten for the males.

The fixed effects, sex and sex*subsp are significant. The random strain effect is also significant.

Is the heterochiasmy prediction met?

Yes, model predicting the mouse average siscoten, sex and sex-subp interaction are significant factors. The Random strain effect is also significant.

## 
## Call:
## glm(formula = Rec.group ~ mean.siscoten, family = binomial(link = "logit"), 
##     data = Male.poly.Mouse.Table_BivData_4MM[(Male.poly.Mouse.Table_BivData_4MM$subsp == 
##         "Musc"), ])
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.52564  -0.06612   0.00415   0.07733   1.40452  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -47.2084    30.9084  -1.527    0.127
## mean.siscoten   1.4513     0.9482   1.531    0.126
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 22.9145  on 17  degrees of freedom
## Residual deviance:  5.2122  on 16  degrees of freedom
## AIC: 9.2122
## 
## Number of Fisher Scoring iterations: 9
## 
## Call:
## glm(formula = Rec.group ~ SisCoTen, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2386  -0.8621  -0.7251   1.3557   1.7917  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.410699   0.087585 -16.107  < 2e-16 ***
## SisCoTen     0.014936   0.001939   7.701 1.35e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2761.4  on 2269  degrees of freedom
## Residual deviance: 2701.2  on 2268  degrees of freedom
##   (37 observations deleted due to missingness)
## AIC: 2705.2
## 
## Number of Fisher Scoring iterations: 4

All the sis.co.ten tests are highly significant. Maybe I should consider running a normalized sis.co.ten? I think nrm_siscoten would still reflect the differing cohesion structure/outcome.

Telomere and centromere Distance

My metric for telomere and centromere distance measure the distance of the nearest foci to the ends of the bivalent (SC). In the plots below each point is a single bivalent. I choose not to use the mark for centromere because it seems noisy and inconsistent…

## Warning: Removed 77 rows containing missing values (geom_point).

## Warning: Removed 78 rows containing missing values (geom_point).

Males on average have much lower raw telomere distance (reflects the telomere bias) compared to females. In Males, 2CO bivalents have very low telomere distances, while the 1CO bivalents have a greater range. In females the ranges of telomere distances have much more overlap.

## 
##  simulated finite sample distribution of RLRT.
##  
##  (p-value based on 10000 simulated values)
## 
## data:  
## RLRT = 6.7045, p-value = 0.0029

Mixed model result summary:

## 
## Call:
## glm(formula = Rec.group ~ mean.telo.dist, family = binomial(link = "logit"), 
##     data = Mouse.Table_BivData_4MM[(Mouse.Table_BivData_4MM$subsp == 
##         "Musc") & (Mouse.Table_BivData_4MM$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8147   0.4434   0.5946   0.7428   0.8505  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)
## (Intercept)      4.5876     5.9959   0.765    0.444
## mean.telo.dist  -0.1719     0.3153  -0.545    0.586
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 15.012  on 14  degrees of freedom
## Residual deviance: 14.673  on 13  degrees of freedom
## AIC: 18.673
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ telo_dist, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8703  -0.8526  -0.8231   1.5327   1.7629  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.775595   0.069867 -11.101   <2e-16 ***
## telo_dist   -0.004915   0.002607  -1.886   0.0593 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2790.4  on 2303  degrees of freedom
## Residual deviance: 2786.7  on 2302  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 2790.7
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ telo_dist_PER, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9024  -0.8669  -0.8050   1.5031   1.7751  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.68815    0.07215  -9.538  < 2e-16 ***
## telo_dist_PER -0.74487    0.22630  -3.292 0.000996 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2790.4  on 2303  degrees of freedom
## Residual deviance: 2779.2  on 2302  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 2783.2
## 
## Number of Fisher Scoring iterations: 4
## Warning: Removed 126 rows containing missing values (geom_point).

## Warning: Removed 126 rows containing missing values (geom_point).

## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 
## Call:
## glm(formula = Rec.group ~ mean.cent.dist, family = binomial(link = "logit"), 
##     data = Mouse.Table_BivData_4MM[(Mouse.Table_BivData_4MM$subsp == 
##         "Musc") & (Mouse.Table_BivData_4MM$sex == "male"), ])
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -6.308e-05   2.100e-08   2.100e-08   2.100e-08   6.409e-05  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)       3373.04 1607241.50   0.002    0.998
## mean.cent.dist     -79.92   38084.76  -0.002    0.998
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1.5012e+01  on 14  degrees of freedom
## Residual deviance: 8.0864e-09  on 13  degrees of freedom
## AIC: 4
## 
## Number of Fisher Scoring iterations: 25
## 
## Call:
## glm(formula = Rec.group ~ dis.cent, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0416  -0.8807  -0.7682   1.4201   2.0345  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.376794   0.093059  -4.049 5.14e-05 ***
## dis.cent    -0.012908   0.002227  -5.797 6.74e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2764.5  on 2271  degrees of freedom
## Residual deviance: 2729.5  on 2270  degrees of freedom
##   (35 observations deleted due to missingness)
## AIC: 2733.5
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ dis.cent.PER, family = binomial(link = "logit"), 
##     data = Curated_BivData[(Curated_BivData$subsp == "Musc") & 
##         (Curated_BivData$sex == "male"), ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1170  -0.8901  -0.7148   1.3584   1.8641  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.2106     0.0935  -2.252   0.0243 *  
## dis.cent.PER  -1.3810     0.1795  -7.694 1.42e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2764.5  on 2271  degrees of freedom
## Residual deviance: 2703.5  on 2270  degrees of freedom
##   (35 observations deleted due to missingness)
## AIC: 2707.5
## 
## Number of Fisher Scoring iterations: 4

The normalized centromere plots show that in Musc males, on 2CO bivalents the 1st CO is closer to the centromere end than in Dom males.

Females have more overlap in the distributions of centromere distances across chromosome class compared to males.

Heterochiasmy Prediction

Is sex a significant effect for the 1CO normalized CO position? (as predicted)

is random strtain effect significant?

The random strain effect seems very significant.

remember to use the mouse average table mouse.avs_4MM (I don’t think I need the MELT data frame)

#make point plots / boxplots which show differences in mean positions 

#scatter + boxplot for t.tests

Dr. Broman suggested that the Smirnov K /(curve comparison) wasn’t the best test to differences in general CO position. He suggested doing simple t-test for the positions

#try remaking the plot Megan suggested
# for 2CO positions, Foci1, Position  on x and Foci 2 position on y

CurBivData_2CO <- Curated_BivData[Curated_BivData$hand.foci.count == 2,]

CurBivData_2CO <- CurBivData_2CO[!(is.na(CurBivData_2CO$Foci2) | CurBivData_2CO$Foci2==""), ]

#isolate 2COs
#facet by sex and subsp

F1.x.F2 <- ggplot(CurBivData_2CO, aes(x=Foci1,y=Foci2, color=strain) ) + geom_point()+ facet_wrap(~sex)+ggtitle("test plot")
F1.x.F2

#what is the pattern of variance
#run analyses for each subsp*sex
#use non-melt DF

#how is the variance partioned across
#cell, mouse, strain

female.Dom <- Curated_BivData[Curated_BivData$sex == "female",]
female.Dom <- female.Dom[female.Dom$subsp == "Dom",]

female.Dom$Foc1.PER <- female.Dom$Foci1 / female.Dom$chromosomeLength

#unorder strain and mouse

female.Dom$mouse <- as.factor(female.Dom$mouse)


female.Dom$strain <- unclass(female.Dom$strain)
female.Dom$strain <- as.factor(female.Dom$strain)

female.Dom_1CO <- female.Dom[female.Dom$hand.foci.count == 1,]
female.Dom_1CO <- female.Dom_1CO[(!is.na(female.Dom_1CO$hand.foci.count)),]

#1CO first
modo <- lm(Foc1.PER ~ fileName + mouse + strain, data=female.Dom_1CO)

#can't get mouse and strain to have sum of square
#residual size decreases with per.F1
#residuals much larger than fileName, mouse and strain no 

#model <- lm(breaks ~ wool * tension, 
#            data = warpbreaks, 
#            contrasts = list(wool = "contr.sum", tension = "contr.poly"))

male.Dom <- Curated_BivData[Curated_BivData$sex == "male",]
male.Dom <- male.Dom[male.Dom$subsp == "Dom",]

male.Dom$mouse <- as.factor(male.Dom$mouse)

male.Dom$strain <- unclass(male.Dom$strain)
male.Dom$strain <- as.factor(male.Dom$strain)

male.Dom <- male.Dom[male.Dom$hand.foci.count == 1,]
male.Dom <- male.Dom[(!is.na(male.Dom$hand.foci.count)),]

male.Dom$Foc1.PER <- male.Dom$Foci1 / male.Dom$chromosomeLength

male.modo <- lm(Foc1.PER ~  fileName | mouse | strain, data=male.Dom)
summary(aov(male.modo))

#only file name is registering as effect
#Review ANOVA frameworks
#http://www.biostathandbook.com/nestedanova.html

Q2 Analysis Predictions, Male Polymorphism and (High vs Low Rec strains)

The general predictions across the males and subspecies based on th above MLH1 results.

For positive correlation traits/metrics

  1. in DOM strains, low to no difference across strains

  2. in Musc, PWD > SKIVE > KAZ, CZECH all the others

  3. in Mol, MSM > MOLF

  1. SC lengths will be longer for high Rec strains.

B.1) Interfernce/IFD will be shorter in high Rec strains. Use IFD_PER to account for SC length differences.

C.1) not enough is known about variation within species for the 1CO normalized positions. Null prediction, no difference in the ‘telomeric pattern’.

The mouse averages for the other position metrics will be highly influenced by proportions of the 1CO and 2CO bivalents. When class of chromosome and SC length is account for, there won’t be a difference, however, not enough is known about these patterns.

C.2) sis-co-ten metric … (what about the clustering?)

C.3) telomere and centromere distances …

M1 glm model for fixed strain effect across male averages.

Q2 SC

remember the 3 predictions for subsp;

  1. Dom, no difference

  2. Musc, PWD and SKIVE greater than others

  3. Mol, MSM > MOLF

Using the mouse average SC lengths, in the full data set all strain effects are significant. This is an indication that there is more variation for the SC lengths than for gwRR / CO counts.

The general pattern is that the high rec strain have a greater mean SC length for all pooled bivalents (5046 males bivalent observations).

Q2 SC by class

The predicted pattern for SC length becomes more nuanced when the data are split up by Chromosome class. Mainly that 1CO are shorter in the high rec strains then low rec strains. This is likely due to the fact that more physically longer chromosomes have 1CO in the low rec strains, which pushes the mean SC length up. Where as in the High rec strains, physically longer chromosomes are more likely to be in the 2CO group. This supports a general pattern of tighter clustering of SC lengths across chromosome classes in the high rec group. They have a lower probability of 2COs below a certain SC length threshold.

I think these comparisons should be done with the Bivalent observations The logistic regression showed rec groups could be predicted by SC length - re-run them while separating out by chrm class

## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength * strain, family = binomial(link = "logit"), 
##     data = Bivalent_1o2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3243  -0.6900  -0.4696   0.5792   2.6840  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  -6.267440   0.621154 -10.090   <2e-16 ***
## chromosomeLength              0.062024   0.007516   8.252   <2e-16 ***
## strainG                       0.486532   0.777194   0.626   0.5313    
## strainLEW                     0.159705   0.888083   0.180   0.8573    
## strainPWD                     1.138253   0.774194   1.470   0.1415    
## strainMSM                     2.167164   1.100226   1.970   0.0489 *  
## strainMOLF                    1.161952   0.905552   1.283   0.1994    
## strainSKIVE                   0.277228   0.786861   0.352   0.7246    
## strainKAZ                     1.331861   0.871914   1.528   0.1266    
## strainCZECH                   0.578077   1.102026   0.525   0.5999    
## chromosomeLength:strainG     -0.012078   0.009075  -1.331   0.1832    
## chromosomeLength:strainLEW   -0.004447   0.010417  -0.427   0.6695    
## chromosomeLength:strainPWD   -0.005375   0.009126  -0.589   0.5559    
## chromosomeLength:strainMSM   -0.013466   0.012688  -1.061   0.2886    
## chromosomeLength:strainMOLF  -0.016616   0.010457  -1.589   0.1121    
## chromosomeLength:strainSKIVE  0.001995   0.009369   0.213   0.8314    
## chromosomeLength:strainKAZ   -0.025109   0.010078  -2.491   0.0127 *  
## chromosomeLength:strainCZECH -0.011721   0.012328  -0.951   0.3417    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5617.8  on 4909  degrees of freedom
## Residual deviance: 4553.7  on 4892  degrees of freedom
## AIC: 4589.7
## 
## Number of Fisher Scoring iterations: 5

These are the plots of the logistic regression.

the plot of log regression for 1COs is wacky/reversed. Musc – the mean SC lengths for 1COs are longer in the low group

## [1] 0.002380373
## [1] 0.999271
## [1] 0.03427751

The top t.tests indicate that when all the mice averages are pooled, there’s a significant difference in SC lengths. But When the means are compared across chrm class, the mouse averages are no longer significant.

Q2 SC length Chrm class prediction

How well does SC length predict chromosome class? Prediction, High rec strains will have more significant p value, given the lower overlap in SC lengths across chrm class. (Note this test can only be done with bivalent level observations, because 1CO or 2CO)

## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9696  -0.7459  -0.5069   0.7601   2.5488  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -5.545924   0.187360  -29.60   <2e-16 ***
## chromosomeLength  0.053119   0.002097   25.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5617.8  on 4909  degrees of freedom
## Residual deviance: 4806.9  on 4908  degrees of freedom
## AIC: 4810.9
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_WSB)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5319  -0.6194  -0.4507  -0.2544   2.5795  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -6.267440   0.621154 -10.090   <2e-16 ***
## chromosomeLength  0.062024   0.007516   8.252   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 554.21  on 578  degrees of freedom
## Residual deviance: 469.39  on 577  degrees of freedom
## AIC: 473.39
## 
## Number of Fisher Scoring iterations: 5
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_G)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5583  -0.6425  -0.4673  -0.2974   2.6141  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -5.780909   0.467117 -12.376   <2e-16 ***
## chromosomeLength  0.049946   0.005085   9.822   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 943.65  on 979  degrees of freedom
## Residual deviance: 829.78  on 978  degrees of freedom
## AIC: 833.78
## 
## Number of Fisher Scoring iterations: 5
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_LEW)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7127  -0.6509  -0.4601  -0.2709   2.6840  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -6.107735   0.634711  -9.623  < 2e-16 ***
## chromosomeLength  0.057577   0.007213   7.982 1.43e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 523.62  on 514  degrees of freedom
## Residual deviance: 442.66  on 513  degrees of freedom
## AIC: 446.66
## 
## Number of Fisher Scoring iterations: 5
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_PWD)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3243  -0.8734  -0.4875   1.0041   1.9707  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -5.129187   0.462109  -11.10   <2e-16 ***
## chromosomeLength  0.056649   0.005176   10.94   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 917.11  on 665  degrees of freedom
## Residual deviance: 752.69  on 664  degrees of freedom
## AIC: 756.69
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_SKIVE)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0949  -0.8146  -0.4626   0.9584   2.2585  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -5.990212   0.483009  -12.40   <2e-16 ***
## chromosomeLength  0.064019   0.005593   11.45   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 970.27  on 763  degrees of freedom
## Residual deviance: 788.24  on 762  degrees of freedom
## AIC: 792.24
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_KAZ)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0743  -0.5927  -0.4657  -0.3384   2.5037  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -4.935580   0.611884  -8.066 7.25e-16 ***
## chromosomeLength  0.036915   0.006714   5.499 3.83e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 481.68  on 585  degrees of freedom
## Residual deviance: 448.88  on 584  degrees of freedom
## AIC: 452.88
## 
## Number of Fisher Scoring iterations: 5
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_CZECH)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6211  -0.6599  -0.4998  -0.3007   2.4588  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -5.689363   0.910180  -6.251 4.08e-10 ***
## chromosomeLength  0.050303   0.009771   5.148 2.63e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 240.36  on 228  degrees of freedom
## Residual deviance: 208.49  on 227  degrees of freedom
## AIC: 212.49
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_MSM)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2329  -1.0478   0.5853   1.0309   1.6173  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -4.10028    0.90811  -4.515 6.33e-06 ***
## chromosomeLength  0.04856    0.01022   4.750 2.03e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 231.73  on 167  degrees of freedom
## Residual deviance: 203.73  on 166  degrees of freedom
## AIC: 207.73
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = chrm.class ~ chromosomeLength, family = binomial(link = "logit"), 
##     data = Bivalent_1o2_MOLF)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6931  -0.7183  -0.5326  -0.3092   2.2596  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -5.105488   0.658905  -7.748  9.3e-15 ***
## chromosomeLength  0.045408   0.007269   6.247  4.2e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 455.53  on 422  degrees of freedom
## Residual deviance: 409.79  on 421  degrees of freedom
## AIC: 413.79
## 
## Number of Fisher Scoring iterations: 4

Summary

When the SC lengths are divided by co number / class, – difference in direction of the logistic regression. I think the shorter 1CO bivalents in the high rec strain reflects the clustering of groups by SC length. (IE the scatter plots of SC length, binned by chrm class on the X).

Each point is a mouse average SC. When all bivalents are pooled, the logistic regression SC lengths predict the groups. When the data is divided by CO number, 1CO’s have a higher SC length in the low group - in the low group more chrms, including the longer ones have 1CO - whereas in the high group, only shorter chrms (SC) are in the 1CO class. For the 2CO class of bivalents - all the SC are longer, the high group averages were longer than the low.

and the mean 2COs SC are higher in the high strains (this must have something to do with the physical size effect)

If I want to try to use mouse nested in strain – I should use cell level metrics (but those have there own flaws)

Ideas for the above 3 tests,

  1. no difference in DOM
  • no sig logistic regression test
  • glm, strain effect?

Is sex a significant effect for SC length? (as predicted)

  • Sex is a significant effect for SC length The results seem to indicate that sex is a significant factor. Consider writing a sub sampling approach (randomize / permute a data set of BivData)

  • According to anova, sex effect explains most of the variance in single bivalent SC lengths.

  • The Long Biv Data set largely agrees with the full curated dataset

  • I caveat I haven’t addressed yet, is the XX in the female Biv data averages —

The mean SC logistic regression model for mouse averages

-10.1979118, 0.1331859

and for the single bivalent levels

0.2095246, 0.0039244

When all male mice are used, the predictive power is greater, than when just the Musc strains are used. When, just the Musc strain are used, The mouse mean SC is slightly significant in predicting if a mouse is in the high or low (should I consider running on female too?)

Is the prediction, high rec musc male strains have long SC met?

In a logistic regression, mouse average SC length is slightly predictive telling if a mouse is in a high or low Rec strain. I couldn’t get the Mixed models working for the male polymorphism predictions…

Q2 Normalized CO positions

A main / biggish cavest to address for this section is the chromosome size effect, use real.long.bivData.

or have chromosome lenght as a effect? would only work for models using the single bivalents, not mouse averages.

Since there aren’t many good predictions for how the 1CO normalized landscape relates to gwRR variation, these will be the main questions

  • Has there been evolution in the 1CO pattern?

  • Is there a consistent pattern with the high rec strains?

Normalized 1CO foci positions will be used. (F1_PER). Check fixed effects of subsp and strains.

sis-co-ten, telomere and centromere distance of foci are metrics which draw from a wider pool of samples. (So maybe I will use them… but avoiding focus on them)

## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_boxplot).
## Warning: Removed 1 rows containing missing values (geom_point).

Remade the Position table. and MSM has a more telomeric distribution. MOLF has the most central distribution.

So the plot for the long bivs, shows that the PWD, SKIVE and MSM have on average more centrally placed F1 positions. It’s kinda werid that the glms/models suggest that WSB has the most terminal F1 – but, out of all the strains, for the long bivs, WSB has the lowest mean.

But there are v few observations. I think the plot shows that there are fewer 1CO for the High Rec group. not sure there would be enough for feeding into models/

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## [1] 0.9097813
## [1] 0.2379652

The normalized Foci Positions are not significantly different between the high and low rec groups (full data set, mouse av and single bivalent level).

Summary

Not really a clear summary for the normalized 1CO position and rec groups.

Q2 IFD

this is the Q2 prediction for IFD / interference

prediction A, Is the Male polymorphism Prediction met? High rec strains have shorter IFDs?

B.1) Interfernce/IFD will be shorter in high Rec strains. Use IFD_PER to account for SC length differences.

  1. glm across subsp

  2. anova within subsp

  3. logistic regression for musc (or all males)

## Warning: Removed 4 rows containing non-finite values (stat_boxplot).
## Warning: Removed 4 rows containing missing values (geom_point).

The above plots are not the best at showing the pattern of higher rec group having slightly higher raw IFD - because their SC are longer on average. The normalized plot displays the pattern of higher rec group having slightly higher longer IFDs.

## 
## Call:
## glm(formula = mean_IFD.2CO_ABS ~ subsp * strain, data = Male.poly.Mouse.Table_BivData_4MM)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -7.7614  -3.3285  -0.3448   2.7876  12.6333  
## 
## Coefficients: (18 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            47.9445     2.2048  21.746   <2e-16 ***
## subspMusc               3.5507     3.6004   0.986   0.3306    
## subspMol                0.1004     3.3072   0.030   0.9760    
## strainG                 3.8170     2.7499   1.388   0.1736    
## strainLEW               3.9088     3.1180   1.254   0.2181    
## strainPWD               8.5103     3.6004   2.364   0.0236 *  
## strainMSM               6.4044     3.4861   1.837   0.0745 .  
## strainMOLF                  NA         NA      NA       NA    
## strainSKIVE             8.3345     3.4021   2.450   0.0193 *  
## strainKAZ              -3.4689     4.0254  -0.862   0.3945    
## strainCZECH                 NA         NA      NA       NA    
## subspMusc:strainG           NA         NA      NA       NA    
## subspMol:strainG            NA         NA      NA       NA    
## subspMusc:strainLEW         NA         NA      NA       NA    
## subspMol:strainLEW          NA         NA      NA       NA    
## subspMusc:strainPWD         NA         NA      NA       NA    
## subspMol:strainPWD          NA         NA      NA       NA    
## subspMusc:strainMSM         NA         NA      NA       NA    
## subspMol:strainMSM          NA         NA      NA       NA    
## subspMusc:strainMOLF        NA         NA      NA       NA    
## subspMol:strainMOLF         NA         NA      NA       NA    
## subspMusc:strainSKIVE       NA         NA      NA       NA    
## subspMol:strainSKIVE        NA         NA      NA       NA    
## subspMusc:strainKAZ         NA         NA      NA       NA    
## subspMol:strainKAZ          NA         NA      NA       NA    
## subspMusc:strainCZECH       NA         NA      NA       NA    
## subspMol:strainCZECH        NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 24.3055)
## 
##     Null deviance: 1781.5  on 44  degrees of freedom
## Residual deviance:  875.0  on 36  degrees of freedom
## AIC: 281.24
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_PER ~ subsp * strain, data = Male.poly.Mouse.Table_BivData_4MM)
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## -0.069618  -0.018333  -0.002938   0.018642   0.084977  
## 
## Coefficients: (18 not defined because of singularities)
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.542597   0.016414  33.057  < 2e-16 ***
## subspMusc             -0.017080   0.026804  -0.637  0.52802    
## subspMol              -0.013032   0.024621  -0.529  0.59985    
## strainG               -0.002548   0.020472  -0.124  0.90164    
## strainLEW              0.012029   0.023213   0.518  0.60749    
## strainPWD              0.079650   0.026804   2.972  0.00525 ** 
## strainMSM              0.045054   0.025953   1.736  0.09112 .  
## strainMOLF                   NA         NA      NA       NA    
## strainSKIVE            0.115751   0.025327   4.570 5.54e-05 ***
## strainKAZ             -0.005417   0.029968  -0.181  0.85757    
## strainCZECH                  NA         NA      NA       NA    
## subspMusc:strainG            NA         NA      NA       NA    
## subspMol:strainG             NA         NA      NA       NA    
## subspMusc:strainLEW          NA         NA      NA       NA    
## subspMol:strainLEW           NA         NA      NA       NA    
## subspMusc:strainPWD          NA         NA      NA       NA    
## subspMol:strainPWD           NA         NA      NA       NA    
## subspMusc:strainMSM          NA         NA      NA       NA    
## subspMol:strainMSM           NA         NA      NA       NA    
## subspMusc:strainMOLF         NA         NA      NA       NA    
## subspMol:strainMOLF          NA         NA      NA       NA    
## subspMusc:strainSKIVE        NA         NA      NA       NA    
## subspMol:strainSKIVE         NA         NA      NA       NA    
## subspMusc:strainKAZ          NA         NA      NA       NA    
## subspMol:strainKAZ           NA         NA      NA       NA    
## subspMusc:strainCZECH        NA         NA      NA       NA    
## subspMol:strainCZECH         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.001347089)
## 
##     Null deviance: 0.122045  on 44  degrees of freedom
## Residual deviance: 0.048495  on 36  degrees of freedom
## AIC: -159.78
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_ABS ~ strain, data = Male.poly.Mouse.Table_BivData_4MM)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -7.7614  -3.3285  -0.3448   2.7876  12.6333  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 47.94446    2.20479  21.746  < 2e-16 ***
## strainG      3.81697    2.74986   1.388 0.173649    
## strainLEW    3.90881    3.11804   1.254 0.218064    
## strainPWD   12.06096    3.11804   3.868 0.000442 ***
## strainMSM    6.50475    3.30719   1.967 0.056945 .  
## strainMOLF   0.10039    3.30719   0.030 0.975952    
## strainSKIVE 11.88519    2.88675   4.117 0.000214 ***
## strainKAZ    0.08182    3.60041   0.023 0.981995    
## strainCZECH  3.55071    3.60041   0.986 0.330619    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 24.3055)
## 
##     Null deviance: 1781.5  on 44  degrees of freedom
## Residual deviance:  875.0  on 36  degrees of freedom
## AIC: 281.24
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_ABS ~ strain, data = Male.poly.Mouse.Table_BivData_4MM[Male.poly.Mouse.Table_BivData_4MM$subsp == 
##     "Dom", ])
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -7.761  -3.365  -1.198   4.777  12.633  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   47.944      2.595  18.473 3.24e-12 ***
## strainG        3.817      3.237   1.179    0.256    
## strainLEW      3.909      3.670   1.065    0.303    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 33.68113)
## 
##     Null deviance: 593.53  on 18  degrees of freedom
## Residual deviance: 538.90  on 16  degrees of freedom
## AIC: 125.48
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_ABS ~ strain, data = Male.poly.Mouse.Table_BivData_4MM[Male.poly.Mouse.Table_BivData_4MM$subsp == 
##     "Musc", ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.7927  -3.2253   0.4061   2.2888   8.7418  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  60.0054     1.8783  31.947 1.75e-14 ***
## strainSKIVE  -0.1758     2.4592  -0.071  0.94403    
## strainKAZ   -11.9791     3.0672  -3.906  0.00158 ** 
## strainCZECH  -8.5103     3.0672  -2.775  0.01491 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 17.63917)
## 
##     Null deviance: 676.54  on 17  degrees of freedom
## Residual deviance: 246.95  on 14  degrees of freedom
## AIC: 108.22
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_ABS ~ strain, data = Male.poly.Mouse.Table_BivData_4MM[Male.poly.Mouse.Table_BivData_4MM$subsp == 
##     "Mol", ])
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -7.735  -0.250   1.027   2.280   2.837  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   54.449      1.927   28.25  1.3e-07 ***
## strainMOLF    -6.404      2.726   -2.35   0.0571 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 14.85856)
## 
##     Null deviance: 171.183  on 7  degrees of freedom
## Residual deviance:  89.151  on 6  degrees of freedom
## AIC: 47.99
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_PER ~ strain, data = Male.poly.Mouse.Table_BivData_4MM)
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## -0.069618  -0.018333  -0.002938   0.018642   0.084977  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.542597   0.016414  33.057  < 2e-16 ***
## strainG     -0.002548   0.020472  -0.124   0.9016    
## strainLEW    0.012029   0.023213   0.518   0.6075    
## strainPWD    0.062570   0.023213   2.695   0.0106 *  
## strainMSM    0.032022   0.024621   1.301   0.2017    
## strainMOLF  -0.013032   0.024621  -0.529   0.5999    
## strainSKIVE  0.098671   0.021491   4.591  5.2e-05 ***
## strainKAZ   -0.022497   0.026804  -0.839   0.4068    
## strainCZECH -0.017080   0.026804  -0.637   0.5280    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.001347089)
## 
##     Null deviance: 0.122045  on 44  degrees of freedom
## Residual deviance: 0.048495  on 36  degrees of freedom
## AIC: -159.78
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_PER ~ strain, data = Male.poly.Mouse.Table_BivData_4MM[Male.poly.Mouse.Table_BivData_4MM$subsp == 
##     "Dom", ])
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## -0.069618  -0.022193  -0.004142   0.024264   0.084977  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.542597   0.019011  28.540 3.76e-15 ***
## strainG     -0.002548   0.023712  -0.107    0.916    
## strainLEW    0.012029   0.026886   0.447    0.661    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.001807185)
## 
##     Null deviance: 0.029624  on 18  degrees of freedom
## Residual deviance: 0.028915  on 16  degrees of freedom
## AIC: -61.349
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_PER ~ strain, data = Male.poly.Mouse.Table_BivData_4MM[Male.poly.Mouse.Table_BivData_4MM$subsp == 
##     "Musc", ])
## 
## Deviance Residuals: 
##       Min         1Q     Median         3Q        Max  
## -0.038754  -0.014849  -0.000845   0.014289   0.042644  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.60517    0.01155  52.400  < 2e-16 ***
## strainSKIVE  0.03610    0.01512   2.387 0.031619 *  
## strainKAZ   -0.08507    0.01886  -4.511 0.000489 ***
## strainCZECH -0.07965    0.01886  -4.223 0.000851 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.0006669007)
## 
##     Null deviance: 0.0559623  on 17  degrees of freedom
## Residual deviance: 0.0093366  on 14  degrees of freedom
## AIC: -75.074
## 
## Number of Fisher Scoring iterations: 2
## 
## Call:
## glm(formula = mean_IFD.2CO_PER ~ strain, data = Male.poly.Mouse.Table_BivData_4MM[Male.poly.Mouse.Table_BivData_4MM$subsp == 
##     "Mol", ])
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.06213  -0.00841   0.01417   0.02132   0.04058  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.57462    0.02066  27.814 1.43e-07 ***
## strainMOLF  -0.04505    0.02922  -1.542    0.174    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.001707272)
## 
##     Null deviance: 0.014303  on 7  degrees of freedom
## Residual deviance: 0.010244  on 6  degrees of freedom
## AIC: -24.581
## 
## Number of Fisher Scoring iterations: 2
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 
## Call:
## glm(formula = Rec.group ~ mean_IFD.2CO_PER, family = binomial(link = "logit"), 
##     data = Male.poly.Mouse.Table_BivData_4MM)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0150  -0.4870  -0.1991   0.3942   2.8025  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -30.303      8.777  -3.452 0.000556 ***
## mean_IFD.2CO_PER   51.505     15.034   3.426 0.000612 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 58.574  on 44  degrees of freedom
## Residual deviance: 30.626  on 43  degrees of freedom
## AIC: 34.626
## 
## Number of Fisher Scoring iterations: 6
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 
## Call:
## glm(formula = Rec.group ~ mean_IFD.2CO_PER, family = binomial(link = "logit"), 
##     data = Male.poly.Mouse.Table_BivData_4MM[Male.poly.Mouse.Table_BivData_4MM$subsp == 
##         "Musc", ])
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -2.417e-05  -2.110e-08   2.110e-08   2.110e-08   2.725e-05  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)         -1180    1283767  -0.001    0.999
## mean_IFD.2CO_PER     2038    2212381   0.001    0.999
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2.2915e+01  on 17  degrees of freedom
## Residual deviance: 1.3284e-09  on 16  degrees of freedom
## AIC: 4
## 
## Number of Fisher Scoring iterations: 25
## 
## Call:
## glm(formula = Rec.group ~ mean_IFD.2CO_PER, family = binomial(link = "logit"), 
##     data = Male.poly.Mouse.Table_BivData_4MM[Male.poly.Mouse.Table_BivData_4MM$subsp == 
##         "Mol", ])
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.13902  -1.06184   0.07054   0.74692   1.81847  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)        -19.22      15.08  -1.274    0.203
## mean_IFD.2CO_PER    34.69      27.08   1.281    0.200
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 11.0904  on 7  degrees of freedom
## Residual deviance:  8.4622  on 6  degrees of freedom
## AIC: 12.462
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ IFD1_PER, family = binomial(link = "logit"), 
##     data = male.bivdata.2CO_IFD)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7053  -1.1591   0.8338   1.1154   1.9156  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.8937     0.2442  -7.753 8.94e-15 ***
## IFD1_PER      3.3385     0.4097   8.149 3.66e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1757.6  on 1267  degrees of freedom
## Residual deviance: 1683.9  on 1266  degrees of freedom
##   (4 observations deleted due to missingness)
## AIC: 1687.9
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ IFD1_PER, family = binomial(link = "logit"), 
##     data = male.bivdata.2CO_IFD[male.bivdata.2CO_IFD$subsp == 
##         "Musc", ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4046   0.4483   0.5623   0.6786   1.4537  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.9125     0.3915  -2.331   0.0198 *  
## IFD1_PER      4.0587     0.6837   5.937 2.91e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 677.59  on 685  degrees of freedom
## Residual deviance: 640.56  on 684  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 644.56
## 
## Number of Fisher Scoring iterations: 4
## 
## Call:
## glm(formula = Rec.group ~ IFD1_PER, family = binomial(link = "logit"), 
##     data = male.bivdata.2CO_IFD[male.bivdata.2CO_IFD$subsp == 
##         "Mol", ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6113  -1.1183  -0.5581   1.0986   1.7937  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.1537     0.6513  -3.307 0.000944 ***
## IFD1_PER      3.7986     1.1356   3.345 0.000822 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 259.1  on 186  degrees of freedom
## Residual deviance: 246.5  on 185  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 250.5
## 
## Number of Fisher Scoring iterations: 4
## [1] 1.970516e-17
## [1] 6.871734e-07

The t.tests return very significant p values for the normalized IFD between the high and low rec groups (for both mouse av and single bivalent observations levels).

Q2 Long Bivalent Dataset

Examine the pattern in the Long bivalent data set to get a feel for if the chromosome size effect skews the pattern. In the long bivalent data set, the longest bivalents (top 25%) from each cell are isolated, so they are more likely to be the same chromosome identities (Chrm1 to Chrm5). The section on IFD only looks at the 2CO bivalents.

table(droplevels(male.long.biv.2CO_IFD$strain) )

The plot above shows the normalized IFD measures from 2CO bivalents

There are many fewer long bivalent data observations, but the same positive correlation with

-table of long bivalents

  • glms

  • logistic regression

There are not enough observations for sub setting the data into strains or use the mouse averages.

Neither t.test are significant for both the ABS and PER when I test just the Musc strains. The above t.tests are breaking the knitr

The t.tests for IFD1s at the bivalent level for the high and low musc males are significant.

None of the logistic regression models for ABS or PER IFD lengths are significant, even when just the Musc strains are used.

Preliminary results from an independent data set indicated that PWD had longer IFDs, which goes against the simple prediction of more COs ~ denser spacing of foci on the same bivalent. This also indicated that interference distance may evolved in the house mouse complex.

Caveats

put all of the code chunks/analysis for caveats here

Chromosome Size Effect

I tried to isolate bivalents which are in the top quartille for SC length from their cells. (re think where this section should go)

Below are examples of plots of SC length distributions across cells. The top figure shows whole cell hand measured data and the bottom shows the Automated bivData from cells with at least 15 bivalents measured. Most plots excluded for space.

Each point is a bivalent plotted by cell on the x axis. X’s are the 4th quartile, big point is the mean and smaller black point is the median. I’m using these to compare the patterns of these statistics in the automated data set which is missing some bivalent data. (the extra stats are not correctly mapped)

## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).

## Warning: Removed 9 rows containing missing values (geom_point).

## Warning: Removed 9 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 41 rows containing missing values (geom_point).

## Warning: Removed 41 rows containing missing values (geom_point).

## Warning: Removed 41 rows containing missing values (geom_point).

All the plots above show the distributions of manually whole cell measured SC lengths compared to the SC length distributions from the automated bivalent data. It shows the amount of within cell variance across strains. There is a bit of variance across the SC length distributions in the PWD females.

This data set might be noisy, given the amount of variance in the SC length distributions across cells (PWD females, WSB females).

The DF real.long.bivData contains 713 bivalent measures. The full data set is 9807. This is the breakdown of bivalent observations by category for this long dataset are

.

- Try to merge this DF with the whole.cell manual measures.

  • Try estimating which ‘loose’ bivalent observations might be within the long class of bivalents.

in code chunk above I ran the mouse averages for the longest bivalents. (680 bivalents, from 54 mice. 10202 bivalents from 86 mice.

## Warning: Removed 3 rows containing missing values (geom_point).

These plots show the SC lengths for the ‘long SC data set’. They are supposed to be the longest 4-5 SC from cells where I could get good measures. These longer bivalents are useful because their patterns shouldn’t be affect by chromosome size effect (which effects, CO position). Hopefully this data set will have less noise from chromosome identity, but there was still data missing (they don’t come from whole cell measures).

Adjusting for XX

(rethink where this section should go) new outline

  1. illustrate problem(affects mostly SC length)

  2. Expected impact on sex comparisons, estimated effect size of the X
  3. (prove general pattern that ALL bivalents are longer), chrms sorted by bin comparisons

  4. 19 female, 19male, 20female 20 male

The female mouse averages should have adjustments for the XX. working on code to estimate the SC length from 3rd largest bivalent from female whole cell data across strains. Subtract this amount from the female mouse averages … This isn’t the best solution – since I can’t determine what proportion of cells for female mouse averages include the XX, (most cells are missing at least 3 bivalents)

  • Of all female single bivalents observations, 5% are XX (1 of 20).

  • The XX is large, likely within the top 25% longest bivalents of the cell (3rd largest by Mb).

  • The average % of XX for whole cell SC (sum(all bivalents)) can be calculated from the whole.cell data set. Lets guess 12% of a cell’s total SC area is XX.

I think a formula something like this can be applied to adjust for XX

  • rate of bivalent segmentation /* rate of XX, 5% /* mean SC length for 3rd longest bivalent / total SC area (by bivalent) = proportion of SC area due to XX /*

Whole Cell Manual Comparison

The plots above show the mean SC lengths and 2SE error bars for single bivalents which have been given within cell rank.

The first plot showing the mean SC lengths by the rank (most all of these cells have 3, MSM has 5 cells (observations)).

The purpose of these plots is to display the variance of single bivalents when they are assigned a within cell rank. For the longest bivalents, XX is predicted to be the 3rd longest (according to physical length Mb).

(use the value for the 3rd bivalent to adjust the single bivalent traits for XX – then compare to males values – or re-run in the MM).

The other figure shows of each single bivalent contributes to the total SC area. Each column is a cell and each color is the percent of total SC area for the longest 5 bivalents in that cell. on average, each of the top longest bivalents make up ~10% of the cell’s total SC area. So for cells all 20 bivalents, of it’s total SC area, 5-7% is due to a XX,

  • Is the difference between cell averages for males and females less that 10%?

  • also interesting, the pwd and msm don’t have longer SC, compared to other strains.

Automated BivData Comparison

## 
##    WSB      G    LEW   PERC    PWD    MSM   MOLF  SKIVE    KAZ    TOM 
##    767    726    714      0   1031    550      0      0      0      0 
##    AST  CZECH   CAST    HMI  SPRET   SPIC CAROLI     F1  other 
##      0      0      0      0      0      0      0      0      0

For the Automated data set, I like to measure the rate of passing bivalent per cell. The mean pass rate will be multiplied to the estimated XX mean_SC.

The table above shows the number of bivalents from the same strains as in the manual whole cell data. The plot shows the bivalent passing rate across all of the individual cells from this female data set. For each strain, I’ll calculate the mean bivalent passing rate (maybe I should look at the mouse levels).

(some of the mice have different ranges of per cell passing rate) - given this ranges, i think the xx adjustment factor should be called on the mouse level. (it could even be extended to cell level – except i don’t think the XX SC length estimates wont be good.)

strain.XX.adjustment.factor = per_cell_passing rate * 1 of 20 random biv will be XX *

** It might be simplier to compare the male and female means, and test it they are greater than the whole cell proprotion of the XX in females cells.** The XX in a whole female cell contributes ~ 7% of total SC, if the female means for a type of total SC measure are from XX. But I am not using ‘whole cell’ summaries to compare female and males.

What is the effect of an extra XX-autosome on single bivalent means?

use a permutation approach: Make a True data set to start with, same(similar) number of cells, mice and bivalents. Make fake data sets which sample 19 bivalents, for ‘in silico’ cells for males and females. Also Run cntrl-female data set, where 20 bivalents are sampled, but randomly. Run the same bivalent level summaries for each ‘permuted data set’; male avSC, 19Female_avSC, and rand.20_Female_avSC. The difference between the rand.20 and rand.19 female -permuted data sets should indicate the influence of having an extra ‘XX-autosome’ in the total data set.

Note on Heterochiasmy Definition

I present heterochiasmy as a comparison of oocyte to spermatocyte MLH1 counts, but the sex chromosomes/bivalents complicate this comparison. In females the XX bivalent is indistinguishable from the autosomes. To the meiotic recombination machinery, it is an autosome and has a similar REC landscape. Whereas in spermatocytes the XY bivalent is visually distinct and any MLH1 where not included in the count). (I note if the and Y are paired, which they are at a high rate). The XY pair triggers a response to un-paired chromosomes and only has MLH1 foci within the PAR (the the tips of X and Y). To make a more equivalent comparison I will estimate which bivalent is the XX in oocytes, and subtract that average REC from the category average of each strain.

  1. Compile full-cell data from females (all 20 bivalents measured)
  2. Look at the SC length -ranked data, extract the 3rd longest estimate average REC for this bivalent,
  3. check how variable the REC is across the 1st,2nd,4th, and 5th are.

According to mouse genome website, the X is the 3rd largest chromosome by total amount of DNA (Mb).

(Put the XX adjustment section here)

There is now MOLF, which has female biased hetC 3 of my Musc strains have male biased patter; SKIVE, PWD and MSM. 1 of the musc strains has female biased heterochiasmy, KAZ.

The mouse specific scatter plots aren’t show here because there are too bulky. These plots are in a different document.

Making all of these scatter plots, allows us to look at the whole distributions of the data for each mouse. The distance of the red line from the black could be a indicator of slides or mice with slide specific technical noise.

Deleted notes

References